Skip to content

nsys perf and eval#2675

Merged
malay-nagda merged 3 commits intomainfrom
malay/nsys_perf_eval
Mar 10, 2026
Merged

nsys perf and eval#2675
malay-nagda merged 3 commits intomainfrom
malay/nsys_perf_eval

Conversation

@malay-nagda
Copy link
Collaborator

@malay-nagda malay-nagda commented Mar 6, 2026

What does this PR do ?

Evaluate perf between given boundaries.

Changelog

start = performance_config.get("eval_time_start_step")
    if start is None:
        start = max(1, int(len(steps) * performance_config.get("skip_first_percent_time", 0.1)))
    end = performance_config.get("eval_time_end_step")
    performance_result["metrics"]["current_avg_iter_time_ms"] = float(np.nanmean(current_iter_time_values[start:end]))
    performance_result["metrics"]["golden_avg_iter_time_ms"] = float(np.nanmean(golden_iter_time_values[start:end]))

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • New Features
    • Added --eval_time_start_step and --eval_time_end_step configuration options for performance evaluation. These parameters enable users to specify precise evaluation windows for timing averages, providing more flexible control over which test steps are included in performance analysis.

Signed-off-by: Malay Nagda <malayn@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 6, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Malay Nagda <malayn@nvidia.com>
@malay-nagda malay-nagda marked this pull request as ready for review March 6, 2026 09:27
@malay-nagda malay-nagda requested review from a team and erhoo82 as code owners March 6, 2026 09:27
@malay-nagda malay-nagda requested a review from ko3n1g March 6, 2026 09:28
Copy link
Contributor

@ko3n1g ko3n1g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just refactor this to only use eval window instead of skip_n_steps? I'm wondering if there will be a case in future where we still need skip_n_steps?

@malay-nagda
Copy link
Collaborator Author

Should we just refactor this to only use eval window instead of skip_n_steps? I'm wondering if there will be a case in future where we still need skip_n_steps?

Wouldn't it be useful in case of variable number of steps like for convergence testing or when the set number of steps were not completed due to time limit or some crash- skip_n_percent can still calculate based on available steps?

@malay-nagda malay-nagda requested a review from ko3n1g March 9, 2026 15:29
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

These changes introduce configurable timing window boundaries for performance evaluation. Two new CLI arguments (--eval_time_start_step and --eval_time_end_step) enable users to specify explicit step ranges for GPU utilization and iteration time averaging, overriding the previous percentage-based skipping logic when provided.

Changes

Cohort / File(s) Summary
CLI Arguments
scripts/performance/argument_parser.py
Added --eval_time_start_step and --eval_time_end_step integer arguments to configure the timing evaluation window boundaries (0-indexed, start inclusive, end exclusive).
Configuration Propagation
scripts/performance/setup_experiment.py
Integrated the new timing window arguments into the performance configuration structure, passing them through to the performance_params dictionary when set.
Evaluation Logic
scripts/performance/utils/evaluate.py
Implemented windowing logic using the new start/end step indices for GPU utilization and iteration time averaging, replacing the previous percentage-based skip calculation and updating corresponding log messages to reflect explicit step ranges.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR modifies core performance evaluation logic affecting metric calculations (avg_iter_time_ms, gpu_util_values) but provides no documented test results or regression testing evidence. Add test results documenting new eval_time_start_step/eval_time_end_step parameters work correctly and provide regression test results confirming no behavioral changes to existing functionality.
Title check ❓ Inconclusive The title 'nsys perf and eval' is vague and uses abbreviated terms that don't clearly convey the specific change being made. Use a more descriptive title that explains the main change, such as 'Add configurable evaluation window boundaries for performance metrics' or 'Support eval_time_start_step and eval_time_end_step in performance evaluation'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch malay/nsys_perf_eval

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
scripts/performance/utils/evaluate.py (2)

684-689: Duplicated window calculation logic.

The start/end calculation at lines 684-687 duplicates the logic from validate_performance (lines 328-331). Consider extracting a helper function to ensure consistent behavior and reduce maintenance burden.

♻️ Proposed helper function extraction

Add a helper function at module level:

def _get_eval_window(config: Dict[str, Any], num_steps: int) -> tuple[int, int | None]:
    """Compute (start, end) indices for the evaluation window.
    
    Args:
        config: Performance config dict with optional eval_time_start_step,
                eval_time_end_step, and skip_first_percent_time keys.
        num_steps: Total number of steps.
    
    Returns:
        Tuple of (start_index, end_index) where end_index may be None.
    """
    start = config.get("eval_time_start_step")
    if start is None:
        start = max(1, int(num_steps * config.get("skip_first_percent_time", 0.1)))
    end = config.get("eval_time_end_step")
    return start, end

Then use it in both locations:

-    start = config.get("eval_time_start_step")
-    if start is None:
-        start = max(1, int(len(steps) * config["skip_first_percent_time"]))
-    end = config.get("eval_time_end_step")
+    start, end = _get_eval_window(config, len(steps))
     current_stable = current_gpu_util_values[start:end]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/utils/evaluate.py` around lines 684 - 689, Extract the
duplicated start/end window logic into a module-level helper function (e.g.,
_get_eval_window(config: Dict[str, Any], num_steps: int) -> tuple[int,
Optional[int]]) that implements the same behavior as the duplicated blocks (use
config.get("eval_time_start_step"), fallback to max(1, int(num_steps *
config.get("skip_first_percent_time", 0.1))), and return eval_time_end_step as
end or None); then replace the duplicated logic in evaluate.py (the block
computing start/end before computing current_avg_iter_time_ms and
golden_avg_iter_time_ms) and the logic inside validate_performance with calls to
_get_eval_window(performance_config, len(steps)) to ensure consistent behavior.

328-333: Consider validating start and end bounds.

The current implementation doesn't validate that:

  1. start and end are non-negative (negative indices have different Python slice semantics)
  2. start < end (empty slice would cause np.nanmean to return nan)

While unlikely in practice, invalid CLI inputs could produce confusing behavior.

🛡️ Optional: Add bounds validation
     start = config.get("eval_time_start_step")
     if start is None:
         start = max(1, int(len(steps) * config["skip_first_percent_time"]))
+    elif start < 0:
+        raise ValueError(f"eval_time_start_step must be non-negative, got {start}")
     end = config.get("eval_time_end_step")
+    if end is not None and end < 0:
+        raise ValueError(f"eval_time_end_step must be non-negative, got {end}")
+    if end is not None and start >= end:
+        raise ValueError(f"eval_time_start_step ({start}) must be less than eval_time_end_step ({end})")
     current_stable = current_gpu_util_values[start:end]
     golden_stable = golden_gpu_util_values[start:end]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/utils/evaluate.py` around lines 328 - 333, Validate and
clamp the computed start and end before slicing current_gpu_util_values and
golden_gpu_util_values: ensure start and end are integers >= 0, clamp them to
the valid range (0 .. len(steps)), and check start < end (raise a ValueError or
return a clear error) so you don't produce negative-index slices or empty
ranges; update the logic around config, start, end, steps,
current_gpu_util_values and golden_gpu_util_values to perform these checks and
fail fast with a clear message if inputs are invalid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/performance/utils/evaluate.py`:
- Around line 684-689: Extract the duplicated start/end window logic into a
module-level helper function (e.g., _get_eval_window(config: Dict[str, Any],
num_steps: int) -> tuple[int, Optional[int]]) that implements the same behavior
as the duplicated blocks (use config.get("eval_time_start_step"), fallback to
max(1, int(num_steps * config.get("skip_first_percent_time", 0.1))), and return
eval_time_end_step as end or None); then replace the duplicated logic in
evaluate.py (the block computing start/end before computing
current_avg_iter_time_ms and golden_avg_iter_time_ms) and the logic inside
validate_performance with calls to _get_eval_window(performance_config,
len(steps)) to ensure consistent behavior.
- Around line 328-333: Validate and clamp the computed start and end before
slicing current_gpu_util_values and golden_gpu_util_values: ensure start and end
are integers >= 0, clamp them to the valid range (0 .. len(steps)), and check
start < end (raise a ValueError or return a clear error) so you don't produce
negative-index slices or empty ranges; update the logic around config, start,
end, steps, current_gpu_util_values and golden_gpu_util_values to perform these
checks and fail fast with a clear message if inputs are invalid.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e1756638-dd38-4019-891b-ec4db2365970

📥 Commits

Reviewing files that changed from the base of the PR and between d740eee and 9458a00.

📒 Files selected for processing (3)
  • scripts/performance/argument_parser.py
  • scripts/performance/setup_experiment.py
  • scripts/performance/utils/evaluate.py

@malay-nagda malay-nagda merged commit f532fcf into main Mar 10, 2026
24 of 26 checks passed
@malay-nagda malay-nagda deleted the malay/nsys_perf_eval branch March 10, 2026 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants